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INTRODUCTION 



Evaluations conducted under xJif* ^ispices of r./^ federal 
government are notorious for their 5acfc erf use irs d£»ci ; -3ns 
concerning the programs under scruVci v/. : lifter mor^ than a 
decade of costly evaluation of Fax * T r ir. ; .3ugh , 'the^t* is 
little evidence that any decisions* fa&vfe* rested on i.tv#'»e - 
expenditures. The Follow Through prctgr*?.'* a® a whol<£ hesa ibean 
remark abl y i mpervi ous to change — s*vwt i -ft ihe f ace of 
repeated attempts by the Admi ni — C i i^n to terminal tb<& 
program. Twelve years after it% ir\*cr;;vt. f i-Dn ? most o4 
Follow Through sponsors remain the and most of tvv:* 

local districts in which they are i mp l>,iS:33jfifr r *:-d reiP- > *. OVtf; 
same. Moreover, there is little evidence cf any 
programmatic influence beyond the Follow Through si tv-V# 
themselves, calling into question the R&D justification. 

Although the Follow Through evaluation produced little 
information of immediate use to decision makers, it was far 
from usel ess. In fact , i t was probabl y the single most 
important experiment on the capacity of evaluation to solve 
probl ems. The mammpth ef f or t produced i nval uab 1 e 
information to the community of evaluators and research- 
ers — information about the limits of national evaluation 
for educational decisions. The Follow Through evaluation 
virtually introduced the notion of implementation of 
treatment t ■ i^orld of educational ^ at on, U 



also de-fined a host of issues from appropriateness of 
available measures to the realities of maintaining 
equivalent comparison groups and the limits of post hoc 
stati stical adjustments. 

By virtue of defining and drawing attention to ther>e 
issues, the Follow Through evaluation generated information 
that was useful — particularly to the research community. But 
it was useful primarily because it was the first of its 
kind — not because it was relevant to the questions around 
which the evaluation was designed. The evaluation did 
little to ansv< - the questions it originally purported to 
tackle: Does Follow Through work? Which approach works' 
best? The former was not answered because it was a 
non-question. Without defining "Follow Through" and 
"work," the question is meaningless. The evaluation also 
did not ascertain which approach is best because of the 
difficulties of comparison groups, measures, goals, etc. that 
have been well documented elsewhere. This is not meant to 
suggest that the evaluation was poorly planned or executed. 
Those who conceived of the planned variations desi gn (al bei t 
as a political move to adjust to drastic budget slashes) 
took a rational approach in the light of what was then known 
about defining and measuring educational successes*. But the 
world of educational evaluation and research on educational 
improvement has changed significantly since the late I960* s. 
We have learned a lot *bout the limits of evaluation — much 
of it from t'r Through e;<perie?nc ■ - ( ? have 
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learned a lot about hew schools change (or, perhaps more 
accurately, about why schools don't change). We are 
there-fore in a much stronger position in designing 
evaluations -for new waves of Follow Through approaches to 
develop evaluation strategies chat have a high potential to 
produce usable information. 

The purpose of the remainder of this paper is to 
communicate some ideas about how an evaluation of new Follow 
Through approaches might be designed to maximize the 
use-fulness of the information it generates. I begin with the 
premise that the -form of the evaluation must be derived -from 
the questions one is trying to answer, the audi ence <s) to 
whom the results are directed, what we already know that is 
related to the treatment under investigation, f and the size 
and shape (intensity, 'duration, number o-f sites, etc.) o-f the 
treatment. I have not been able to obtain specific informa- 
tion on any of these topics beyond what exists in the NIE 
planning document of October 1, 1980. Therefore I have chosen 
to make up some characteristics of the proposed first wave 
new Follow Through approaches in order to have some concrete 
examples to draw upon in communicating my views about their 
eval nation. 

Drawing on the ideas contained in the NIE (Shiller et 
al . > document of October 1, 1980, M Plans for Follow Through 
Research and Development, u I pretend that the first wav^ 
of new approaches will consist of thrse different strategies 
designed to increase the amount of time devoted to 



instruction. For purposes of this discussion I de-fine the 
three strategies (loosely) as •follows: 

- Strategy A: An intervention that provides an intensive 
in-service training program -far teachers designed to 
teach speci f i c classrcom management techniques that 
minimize lost time. 

- Strategy B: An intervention designed to -farm a 
school-site council consisting of school staff and 
community representatives whose charge is to design ways 
of increasing the amount of time available -for 
instruction. 

- Strategy C: An intervention that provides training to 
principals in how to reorganize their "schools (e.g., via 
coordination of multiple programs) to protect the part of 
the school day that is devoted to instruction -from 
interference. 

Although these definitions of treatment are vague, they 
will serve the purpose of providing something concrete to 
refer to in th© f >al I owing discussion of possible approaches 
to their evaluation. Th© discussion will consider the - .... ics 
given above from which the design of the evaluation must Lk. 
derived: the audience for the answers, the questions to be 
answered, and tie characteristics of the intervention. 



Implicit under each topic is the notion that our prior 
knowledge suggests what can be achieved and thus should 
influence the choice o-f audience, questions, and type o-f 
intervention- The topics are intimately interrelated and 
thus di -f -f i cul t to treat separately. For convenience I 
discuss -first the audience (which subsumes the overriding 
purpose o-f the evaluation) -followed by the questions and 
■finally implications -for the intervention itsel-f. Then 
I turn to implications -for the design o-f the evaluation. 

CONSIDERATIONS 

The discussion is intended to support the -following 
claims I make about evaluation: 

1. An evaluation cannot be all things to all people; 
questions and audience must be limited in advance. 

2. Tht: evaluation questions should be grounded in 
what we know -from previ ous research and 
evaluation — including what is 9 answer abl e ? given 
existing measures. 

3. i he end result (outcome) o-f the intervention i s o-f 
little value without understanding how and why it 
was or was not achieved. 
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4. The design and implementation of the new 

approaches must be done i n conjunction wi th the 
evaluator. 

The Audience 

Before one can determine what qualities would make an 
evaluation of the new Follow Through models useful, it is 
necessary to ask "useful -for whom?" One of the biggest 
pr obi ems that has besieged eval uati on i s that of aver — 
promising by claiming too many purposes. The history of 
evaluations of ESEA Title I is -fraught with illustrations of 
the problem of trying to serve multiple audiences, each with 
a different stake in the program and hence in ths results 
of the evaluation, with a single evaluation- Who are the 
audiences for an evaluation of new Follow Through 
approaches? Presumably there are potential users at all 
levels of the educational system: federal, state anci 
local.. Uj. _nin pach level , as well, there are potential 
users with different information needs (that is, wit i 
different questions that the evaluation might answer for 
them). Far example, at the federal level, there i s an 
audience in Congress — an audience which probably contains 
various viewpoints but which pretends to speak somewhat as 
one in legislation. The Congressional audience views Follow 
Through as a servi ce program and is thus interested in the 
question of whether the program is serving intended 
bene-f jeiaries and the quality of those services. There is 
al so an audi ente i n the Ad mi ni strati on — i n fact , there are 

6 

8 



probably multiple audiences in the Administration since 
there are at least two agencies involved in Follow Through 
and different agencies usually have different agendas. 
There may even be more than one audience within the program 
office itself since there are likely some staff who view 
Follow Through as a service program and are thus primarily 
interested in ascertaining whether the program is being 
administered properly and whether the services are being 
delivered while others view the primary purpose as research 
and development and might want to develop and test various 
new approaches* 

The closer one gets to the operating program itself, 
the more certain questions change frc .hose asked 
several levels above the program. A district administrator 
or program director is interested in what p ogra^s would 
'work-' in his/her district or subset of schools. A 
principal with several Follow Through classes in his/her 
school is likely interested in those classrooms as a 
unit — and even in their impact on other parts of the school, 
A teacher or a parent aide or a classroom sp^ci^list of 5ome 
sort is interested in cuestions pertaining to the particular 
classroom. Many parents are interested just in their child 
and find any assessment of a larger unit not particularly 
rel evant . 

I would urge that the. first step in designing an 
evaluation of the new Follow Through approaches should be to 
decide upon the primary audience of interest — do not try to 
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meet all the information needs of all the actors- NIE has 
begun/this process and*their documents indicate that the 
primary audience of interest is the local decision maker. I 
offer strong support for this choice having initially 
embarked, for this very paper, on an attempt to identify the 
federal policy issues in Follow Through in order to make 
inferences about what types of evaluations of the new 
approaches would be most useful to federal policy makers. ;t 
found it exceedingly difficult to identify policy issues 
beyond those that have existed since the birth of Follow 
Through and will continue until its demise. Should Follow 
Through be a service program or a research program? This is 
not a question for which evaluation can provide reasonable 
input. This is a question that subsumes i ssus of equity, of 
values, and of political support and commitment. Never has 
the debate between service and research been cast (nor could 
it be) as a researchable question. "Does Follow Through-* 
work?" is similarly distant •from evaluation. This is the 
false evaluation question around which the first evaluation 
was bui 1 1 and conti nues to be, at that 1 eyel , a non-ques- 
tion, Only with much greater specificity does this 
question become researchable. 

Si mi lar ly , the question of whi ch approach works best , 
though often cited as the federal R&D question of interest, 
is not answerable as our experience with the first round of 
model s has shown. Therefore, NIE ? s goal of informing local 
decision makers of promising management strategies* seems a 
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reasonable choice of audience. 
The Questions 

The choice of questions to be answered by an evaluation 
is intimately related to the choice of audience and our 
current knowledge about the issues- If local decision 
makers are to be the primary audience for «the evaluation, the 
design must reflect what i s of interest to them, constrained 
or el aborated by what we as researchers know are the 1 imi ts 
of evaluation (e.g., what can be measured well) and factors 
we know to be important from previous research on school 
change and improvement- To illustrate this I 7 11 draw on the 
three hypothetical strategies presented above designed to 
increase instructional time (A through .teacher in-service, B 
through school—site councils and C through training 
principals) ,and assume each is implemented in several 
schools (say, two in each of three districts). I will take 
for granted all the arguments that have been presented over 
the years about the 1 i mi ted usef ul nessi of ths pi anned var i ati ons 
approach. Hence, I assume that determining which approach is 
"best" according to some predetermined set of outcome measures 
is not only impossible to implement (because of lack of corrpar — 
ability) but, as I will argue below r not relevant to the needs 
of any audience. 

If it's not a horserace, then what is the compar i son of 
interest? Logically, the comparison of interest might be 
within a strategy — did the amount of instructional time 
increase? Suppose, for the sake of argument, that five 
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schools with Strategy A had increases in instructional time 
while three each with Strategy B and C showed increases. 
(Assume that, in each, instructional time is measured prior to 
the manipulation, say spring 1982, and after a year, spring 
1 983) . What inf or mat i on of use does thi s set of f i ndings 
communicte? I suggest that virtually nothing would have been 
learned — not because the results were mixed (which, however, 
is virtually inevitable), but because what is of use to 
someone else trying to increase instructional time is WHY and 
HOW a particular approach worked — not whether it worked. 
WITHOUT THE HOW AND WHY, IT IS IMPOSSIBLE TO MAKE INFERENCES 
ABOUT WHETHER IT WOULD WORK IN ANOTHER SETTING. Under- 
standing what factors in the context facilitated or hindered 
the attempt to increase instructional time is critical 
i nf or mat i on f or one trying to i mpl ement an approach in a 
particular context beyond those studied. Outcome 
measures by themselves are of little value in decisions 
of this type. 

The finding most widely agreed upon and cited from the 
Fol 1 ow Through PI anned Vari ati ons ex per i ment supports thi s 
argument; to wit, the finding that there was as much variation 
between sites within sponsor as between sponsors. In addition 
to this source, a growing number of studies in recent years 
have confirmed the notion that the particular features of tH&N 
context in which a change is bei^g implemented are over- 
whel mi ngl y associ ated with the "success or f ai 1 ure of the 
attempted change. JtSee references) . 
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Returning to the audience o-f interest; let's presume 
that the target audience is a district level administrator 
since the district is the grantee in Follow Through. What 
doefi such a person want to know? Typically, thoughtful 
administrators want to know how Approach X worked in a 
district (school) similar to theirs. Such an administrator 
might claim interest in knowing only whether (a) there is an 
outcome measure that documents success and <b) whether the 
disrict or school studied has characteristics or circum- 
stances similar to theirs. Most administrators are 
sophisticated enough to know that what works well in one 
situation may not work at all in a completely di f-ferept 
setting (e.g.* one which has a strong union,, a recent earth- 
quake or a Spanish speaking population as compared to one that 
r dpesn 9 1 ) . ■ *0 

Those who are more sophisticated may recognise that 
success al so depends upon, the sta-f-f involved, their. de«ire 
and ability to accomodate change (is this the tenth 
innovation in -five years?), the match between what they are 
currently doing and the proposed change, and general 1 y what 
new demands will be placed upon all involved in impelmenting 
the new approach. This might include the particular 
educational philosophy o-f. a particular principal or o*f a 
group o-f parents active in improving the schools. 

D esign o-f the Intervention 

The primary purpose, of this paper /is to suggest how a 



new w^ ve of Pol low Through approaches should be evaluated. 
It i 13 beyc/nd the paper's scope to suggest how the 
interactions should be designed and implemented; however, 
the e veluation is inextricably linked to the design of the 
interaction* Therefore, I want to communicate some 
considerations that should go into the design of the pilot. 
The first consideration is the role of the evaluator. It is 
critical -for the ©valuator to be involved in the design of 
the intervention. Because the primary purpose of the pilot 
is to l^arn -about certain interventions, the choice of 
audi ei^ce,. questi ons, measures, prior knowledge should all 
affect ^he shape of the interventions themselves, not just 
the evaluation- If NIE designs certain interventions and 
determines the sites in which they will be implemented and 
the cji^^tion of the intervention, AND if\,.these decisions are 
made without regard to the eventual audience for the 
evaluation, questions to be answered, etc., there is little 
point in 'having an evaluation. Even if NIE does consider al 
the.: Q fc>ove in their design, unless NIE is to conduct the 
evaluation itself, few evaluators would be happy to step int 
a situation v^ith so many constraints already imposed. 

pi second consideration in designing the .intervention is 
that p+ durati on . We are in a rapi dl y changi ng wor 1 d ■ 
Given ^declining enrollments, school closings, reduced 
funding* population shifts, etc., studying the process of 
chan^ over, a period of years may have little relevance for 
the ^<?*~lci that will then exist. This suggests that a ten 



year, undertaking is not use-ful (especially given administra- 
tive considerations such as changing leadership, evaluation 
staff, and so on). Moreover, with most Congressional cycles 
running under three years ? it makes little, sense to design 
studies dependent upon federal funding that exceed these 
cycles (-from the initial design to the final report — not just 
data collection). 

In vi ew of these severe ti me constr ai nts, it makes sense 
to consider ways of obtai ni ng i ntermedi ate results. If the 
ultimate outcome is increased student achievement, which 
seems inevitable, it would be valuable to design 
interventions for which there are intermediate outcomes of 
interest- For example, i nterventions desi gned to increase 
achievement through increased time devoted to instruction, 
could report on increases in . instructional time prior to 
measuring achievement. 

Finally, as the example of instructional time suggests, 
the intervention should be shaped by the -reality of schools 
as organisations. Given the importance of context, it makes 
sense to take that con t ex t i nto con si derati on i n desi gni ng 
i ntervehti ons rather than shapi ng the i ntervent i on i n terms 
of children without regard to context. Trying to change the 
context in which children learn (which requires dealing with 
the school as an organisational entity) rather than changing 
the chi 1 dren mi ght resul t in i ntervent i ons whi ch have a 
greater chance of producing change. (See Henry Acland, "On s 
Structure" ) . 
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IMPLICATIONS FOR EVALUATION 

In the preceding sections I have suggested that a 
useful evaluation must be -focused on a particular audience 
and a particular set of questions that are relevant to that 
audience. In particular, I chose local decision makers as 
the primary audience of interest and justified the aeed to 
under stand the process of i mpl ementat i on and change by 
appealing to their information needs which are context 
specific- Before moving on to the implications for evalua- 
tion of these choices, I submit that this type of under — 
standing is equally relevant to the federal decision 
maker- Although certain federal decisions require national 
data (populations or samples), such as questions of prevalence 
and other numerical information, federal actors in the 
business of designing and administering programs to improve 
education need to understand how the context affects the 
process of implementation and change in order to make federal 
pol i cy that wi 1 1 be ef f ecti ye. In thi s secti on I try to show 
the limitations of traditional experimental and ethnoigraphi c 
approaches in producing this type of information and propose 
in their place an approach that minimises the weaknesses . of 
each approach- 
Limitations of Experiments and Case Studi es 

The presumed advantages of an experimental approach 
lie in its potential to isolate causal factors and to 
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produce general i zabl e -findings. These advanatges can only be 
achieved under ideal conditions;, e.g., random assignment to 
treatment and random sampling -from a well-defined population. 
Random assignment is seldom -feasible and post hoc compensa- 
tions are far from satisfactory for isolating causal factors. 
Random sampling is. feasible but requires, for most purposes, 
that the sample be large and the factors of interest 
be precisely specified in advance. If we could list the 
ten factors most likely to explain differences in agreed 
upon outcomes of interest AND list in question form the fifty 
factors that ref 1 ect our best guesses as to addi ti onal 
explanatory factors, we could define a sample that might 
result in the desired information — IF we were right in our a 
priori choice of variables. We could choose the sample to represent 
variation on the first ten and design survey instruments based 
on the fifty. If we were wrong, however, we would have learned 
very little at considerable expense. Unfortunately, the 
evidence suggests we WOULD be wrong since we are just 
beginning to learn what factors are important to understand 
in predicting how humans change their behavior in complex 
organizational settinqs. 

We have already learned that the factors that we originally 
expected to be good predictors of educational outcomes — 
sex, age, and the usual raft of other child and teache'r char — 
acteri sties— are woefully inadequate in predicting the 
results of attempts to change. From a number of change 
efforts, we have learned that predictors of change include 
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such -factors as how a school is introduced to an innovation, 
whether school staff were part of the decision to implement a 
given strategy., and whether the strategy was compatible with 
ongoing enterprises in the school. These types of explanations 
suggest that the important correlates of change cannot be cap- 
tured by simple two and three-way interactions. 

□n the other hand , tradi ti onal sthnographi c or case 
study approaches, whi le abl e to yi el d more rich and rel evant 
information, are of minimal use to federal policy makers 
concerned wi th maki hg statements appl i cabl e to the nati on . 
Although arguments exist for generalizing from a single case 
(«L : ee Kennedy, "Generalizing From a Single Case Study") , a 
federal decision maker would have difficulty defending 
policy based 3D an evaluation of a single case. 

Anohher limitation of case studies is the quantity and 
tvp« of information obtained. Ethnographers are loath to 
cc:*<strain data collection in advance by preconceived notions. 
V.v'., x * one does fielc work without a carefully predefined 
sTTw'cture tr\ L does in fact constrain data collection, the 
resui i- L'i a ^f^oth amount of undigested information — 

in h^r.nat;. that r.^y irrelevant, untrustworthy, and 
ex Sremel y di-^rcult to decipher. Case study investigators are 
&lz,a loath to draw conclusions, preferring to leave all 
inferences to the reader — an extremely burdensome task far a 
decision maker. 

But suppose we blend the two approaches — experimental 
and case .study — in a way which preserves the structure and 
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general izabi lity of the traditional experiment with the 
richness and relevance of the information gained in a case 
study approach. I call such an approach the "multiple case 
study approach 11 which is actually a shortened -form of 

"multiple site, structured case study approach." <This 

o 

phrase and the discussion that follows draw heavily on the 
ideas presented in Greene and David, "Generalizing from.Multi 
Case Studies," in which the supporting arguments are more 
fully explicated), 

The Multiple Case Study Approach 

The multiple case study approach rests on building 
a conceptual framework at the beginning that lays a 
map of the territory, followed by careful sample selection 
done , purposeful ly to insure variation on certain factors of 
importance. The conceptual framework also serves to 
structure data collection insuring that the data are 
comparabl e so that the anal y si s, Which 1 oaks for 
similarities and differences across cases, can be conducted 
with integrity. 

Conceptual Framework 

A carefully developed conceptual framework is. the 
backbone of a multiple case study appVoach. I hesitate to 
use this phrase because it conjures up images of the 
obligatory literature reviews and references to theory seen 
at the begi nni ng of many research reports and never referred 
to again. The conceptual framework of which I speak serves a 
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•far more pervasive role in the conduct of the study. At the 
beginning, the conceptual -framework serves to organize 
existing knowledge about school change. In the* illustrations 
given above, the conceptual -framework would draw on the 
general literature of school ch ,ige as well as research 
specif i cal 1 y concerned with iru asing i nstructi onal time. 

The conceptual -framework serves two critical -functions. 
First, it limits the topics on which data will be collected. 
Second, it presents the context in which the data should be 
interpreted. By identifying the important components o-f the 
intervention and the environment in which it is implemented, 
the conceptual -framework reduces to a manageable set and 
makes concrete an .otherwise unbounded set o-f concepts (the 
whole -field o-f personal and organisational change i OfH 
this ^et are gei r Ated the topics on which data will be col- 
lected and the classes o-f appropriate respondents. For 
example, a topic might be the role o-f the principal in the 
school (the principal as an source o-f explanation -for change 
or lack thereof in teacher behavior). In a particular school 
the interview with the principal would be. in part determined 
by the findings about change i n the teachers. Hence the 
questions would be different from one school to the next, 
depending upon what els€? was going on in the school. 

Through i denti f yi ng the topi cs and the context that 
shapes thei r meani ng , the conceptual framework structures 

and limits the data collection but does hot constrain it a 

pri or i to speci f i c quest i ons . as do survey i nstrumervts. 
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Moreover, it serves to communicate to readers of the final 
report the particular viewpoint of the evaluator so that the 
meaning of the findings can be judged. As such, the 
conceptual framework also serves as a guide to sample 
selection and as the basis for site visitor training to 
insure comparable data. Each of these areas is described 
below. 

Sampl i ng 

Careful sample selection is critical in conducting a 
multiple case study for it is the basis upon which 
general izabi 1 i ty of the findings wil3 be justified. It must 
be r\c::„a pur posef u 1 1 y 9 drawing on elements of the conceptual 
framework. In structure it is analagous to sample selection 
in an experiment. Just as we would incorporate into any 
experimental or quasi -experimental sampl i ng plan those 
factors anticipated in advance to be powerful explainers, so 
would we here, but with one important exception. We need to 
insure variation on those factors expected to explain 
outcomes, but we don't need to include all combinations of 
all levels of each factor. 

For example, suppose we want to implement the school-site 
council concept in six schools. We have a pretty good idea 
thVt: 

aV\ Teacher support for such a council is an important 

predictor of its eventual success, 
b. It is N easi^er to implement changes in a small 
school than v to a larger-one. 
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c- Administrative support is important in instituting 

a pi anni ng and deci si on making group . 
d. Staff stability is important to the creation and 
mai ntenance of such a counci 1 . 
Obviously, each of these factors can be measured in a variety 
of ways and with varying degrees of confidence. For purposes 
of sample selection, however, it makes sense to limit th^ 
factors to those that can be measured in advance with ease and 
confidence- Hence, for example, I would eliminate teacher 
support because it is extremely difficult to measure 
accurately from a di stance- /And unless one can develop a Q"Dod 
proxy for Administrative support that can b3 measured at long 
range, I would be equally wary- School size and staff 
stability.; on the other hand, can be measured at a distance 
with relative ease and 3tccLtrs(cy a NIE can use this type of 
information to select a sample (both for the implementation of 
new approaches and for their evaluation) according to one of 
two strategies- 
One strategy is to select siten for the pilot that are 
high on all the factors anticipated to affect the desired 
changes- This rests on -the argument that it is .so easy to 
have NO effect in educational manipulations that it . makes 
most sense to stack the deck in advance as much a© possible. 
Thi s argument says that you wi 1 1 learn more from successes 
than f ai lures — not only because f ai lure is so common but 
because there is no way to i sol ate reasons for -failure and 
therefore to make valid inferences from such cases- Hence, 



maximizing the chance -for success is important -for 
constructive learning. District administrators need to know 
far more what is likely to facilitate change than what is 
1 i kel y to pose barriers. 

For this approach to be useful, it is necessary to demon- 
strate that t^ie sel ecti on procedures are 1 i kel y to yi el d the 
"best" cases. Thus one must justify the choice of selection 
factors and the choice of sites representing high levels on 
the factors (as judged , for i n stance, by a consensus* of 
experts and practitioners). If the selection process is 
defensible, then, this approach maximizes the likelihood of 
finding successes and in essence provides a test of the 
hypothesis that the intervention can bring about change. 
This approach does NOT, however, provide a basis for 
generalizing about the conditions under which success 
will occur.. In this context, it is an exploratory study; 
if there *re successes, one can speculate aboLit their 
explanations. A different approach, shown below, is needed 
to confirm the explanations for change. * 

The second strategy is to select a sample to achieve 
maximum variation on the factors of interest (where maximum 
means representative of the full' range of variation in 
the nation). Suppose that we have chosen school size and staff 
stability as the selection factors, with a high, medium, and 
low instance of each factor. We do not need to include all 
possible combinati ons but we do need to insure that the sample 
in which the intervention will be tried contains a high, medium 



and low school on each of the two -factors. 

Since the selection -factors usually must be measured at a 
distance, they will be serving as proxies for other sources 
of variance that may be either too difficult to measure from a 
distance, impossible to define precisely, or factors for which 
a relationship is known to exist but nothing is known about 
the form of that rel ati onshi p. Hence, school size may have 
broad support among researchers as an important concommitant 
of ability to change without their having much idea of the 
intervening processes through which school size affects the 
process of change. With a range of school size in the sample, 
one may discover, for example, that an important key to 
implementation is frequent informal communication with the 
principal and that this is simply easier to accomplish with a 
faculty of 12 than a faculty of 30. 

But what al 1 ows us to general ize such a finding to si tes 
hot included in the sample? First., we must have confidence 
in the finding for the sample. This means that we have 
looked at situations in which the likely explanatory factors 
vary enough to draw conclusions about which ones are in fact 
having the greatest effect, and under. what limiting conditions. 
On this basis, we should be able to convince other researchers 
that our explanation is plausible and that we have considered 
and rejected all plausible alternatives. These claims are 
not ultimately provable (in any paradigm) — they are judgments 
made by knowledgable persons? based on their abilities to think 
of ^alternatives, to test them, and to persuade or be persuaded 
. . 22 
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that, the findings are valid. (In an experiment, these 
judgments must be made before data are collected and form the 
basis for the hypotheses to be tested and the factors on which a 
sample i 3 selected). Given a valid finding in the sample, tr.e 
basis for generalising to other sites rests on evidence that 
the sample contains variation on plausible explanatory 
factors representative of the variation that exists in the 
population of interest. 

i 

Data Collection 

So far we have devel oped a conceptual framework that 
identifies those elements that we deem important in 
evaluating the impact of new Follow Through approaches. We 
have also chosen a sample on the basis of those factors in 
the conceptual framework that are widely agreed to affect the 
impact and that are measurable from a_ distance. The next 
step is to translate our preconceived notions of what is 
interesting into a data collection strategy. We seek com- 
parability across units in a multiple case study design 
for the same reason as i n an experimental design: to be 
able to make valid inferences about the relationships, 
between differences in outcomes and differences in condi- 
ti ons. 

In any case study data collection, the data collector 
(or site visitor) himself or herself should be viewed as 
the data col lection instrument. In a multiple case study 
design, the interview guide provides the structure within 
which data will be collected. The guide ;is derived from 
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the conceptual -framework and organised by the questions to 
be answered in the analysis. It is only a guide, however, 
and as such says little about ths? necessary amount o-f detail 
or the particular respondents to include or the ways to 
ask questions, or how to know when enough information has 
been obtained on a given topic. 

Given the context-dependent nature o-f the data, spec-fic 
answers to these issues will vary -from site to site — but 
site-speci-fic guides would -foreclose comparability c*f data 
across sites. There-fore it is necessary to train the data 
collectors so that they hdave a shared understanding o-f 
the purposes o-f the data collection |as well as specific 
ski lis tor max imieing the validity o-f the data that are 
collected). Shared understanding can best be accomplished 
through involving the data collectors in the development o-f 
the conceptual -framework and interview guide. A shared 
veiwpoint and understanding o-f what constitutes reliable an.d 
valid data cannot be accomplished in one-shot training 
sessions but must evolve -from continuous immersion in the 
concepts and goals o-f the study. Speci-fic skills -for 
maximizing i nternal val i dty can be imparted in a more 
structured way through -formal training and rehearsal in v 
methods such as cross-examination and tri angulation. 
Through simulations o-f on— site data collection, data 
collectors can gain experience in probing, in developing 
multiple approaches, aryd. in drawing inferences on the basis 
o-f multiple plerspecti ves. 
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The particular purposes of each study will dictate how 
the data are transcribed -from -field notss. To the extent 
that, an interview guide reflects the categories in which the 
analyses will be reported, it is usef ulo to write up the 
•field notes in the -format outlined by the interview guide. 
Whether or net this is the case will be a -function of how 
closely the conceptual framework matches the reality found 
in the sites- In conducting a multiple case study, it is a 
tremendous advantage to have the luxury of I ongi tadinal i ty — 
•-at— 1 east— ;to-H:h&- extent—thrat -more than one wave of data 
collection can be conducted, even if within one shool year. 
If additional waves of data col lecti on are possible, then 
the conceptual framework can be revised after each visit to 
better reflect reality. This can compensate somewhat for 
omissions in the original conceptual framework dr 
anti ci apted rel ati onshi ps that are not supported by the data 

Anal ysi s 

The first important stage of analysis occurs while 
the data col 1 ectors are in the f i el d. Each data 
collector, in the process of gathering i nformati on through 
interviews and observations, has imp! icitly generated , and 
tested innumerable hypotheses. Choices about whom to ^ 
interview, what questions to ask, how far to go, etc. 
are. made on the spot, based on the data collector's 
knowledge, experience and judgment of what is there is to 
be learned. The data collector is constantly developing 
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hunches about connections between events and choosing 
questions that will test those hunches. The hunches (or 
operating hypotheses) are constantly revised, retested, and 
revised and retested again until the data collector has 
conf i dence through mul t i pi e sources and per spec t i ves that 
the story is internally consistent and inherently plausible. 

The second stage of analysis occurs after a round of 
data collection. The analyst is) must -First become -familiar 
with each case and draw conclusions -from individiual cases 
so as to connect the -features of the local context with the 
change being studied. Then the analyst (s) conducts pairwise 
comparisons in which tentative conclusions based on one case 
are systematically tested against each of the other case<=». 
The purpose of these case— by— case comparisons is to fine 
tune, modify, and refine the propositions to that they are 
expressed precisely to reflect the limiting conditions 
reveal ed by the patterns of f i ndi ngs across all the cases 
(e,g.j, x is true in large schools or y is true in schools 
with strong principals) . If the amount of modification 
requi red to make a proposi ti on hoi d in al 1 i nstances i s 
excessi ve — amount i ng to a compl etel y si te dependent 
phenomenon — the proposition is dropped as uninteresting. 
The conclusions that remain after this obstacle course 
of pairwise comparisons are finally presented with 
illustrations drawn from the cases, n-n a clear and 
concise form that can be easil^H^ead and understood by the 
audience of primary concern, i 



CONCLUSIONS 



To maximize the use-fulness of what is learned -from a new 
wave of Follow Through approaches, .ne design of both the 
pilot and its evaluation should be grounded in reality. The 
two most salient -features of reality are -first, that 
attempts to change educational practice are context dependent 
and second, that there are limits on what CAN be learned about 
the effects of an intervention. There-fore, this paper has 
presented arguments in support of the -following recommenda- 
tions: 

<. ' 

-The pilot (including the characteristics .o-f the inter — 
vention, its scope and duration) should be designed in 
conjunction with the evaluation. 

-The audience for the eval uati on and the quest i ons to be 
answered should be focused and realistic (both in terms 
of the types of information useful in decisions ^nd in 
terms of what we can answer) . 

-The primary goal should be to understand how and why an 
intervention works not just the end resulto The 
evaluation should rest on the assumption that the 
context is something to be examined NOT something to ; 
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be controlled. 

-The proposed mul ti pi e case study approach provi des a 
way of maximizing the use-fulness of the data collected 
without sacrificing general i zabi 1 i ty. 

No experimental approach will result in unambiguous 
•findings; neither will a multiple case study approach. Both 
approaches rest ul ti mat el y on the experience, knowl edge and 
ability of the evaluator (s) . There are clearer and more 
generally accepted rules for conducting experiments than 
there are for conducting multiple case studies. But these 
rules, which are inevitably broken in field experiments, are 
not designed to elicit the types of data that are likely to 
be used by decision makers trying, to improve educational 
- experiences for children. Therefore, it makes sense to move 
in the direction of refining the type of methodology that is 
built around increasing our understanding of a complex world 
and hence more 1 i kel y to produce i nf ormati oh of immedi ate 
use. 
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